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ABSTRACT 

Statistical dependencies among wavelet coefficients are commonly 
represented by graphical models such as hidden Markov trees 
(HMTs). However, in linear inverse problems such as deconvo- 
lution, tomography, and compressed sensing, the presence of a 
sensing or observation matrix produces a linear mixing of the sim- 
ple Markovian dependency structure. This leads to reconstruction 
problems that are non-convex optimizations. Past work has dealt 
with this issue by resorting to greedy or suboptimal iterative re- 
construction methods. In this paper, we propose new modeling 
approaches based on group- sparsity penalties that leads to convex 
optimizations that can be solved exactly and efficiently. We show 
that the methods we develop perform significantly better in de- 
convolution and compressed sensing applications, while being as 
computationally efficient as standard coefficient-wise approaches 
such as lasso. 

Index Terms — wavelet modeling, deconvolution, compressed sens- 
ing 



1. INTRODUCTION 

Statistical dependencies among wavelet coefficients are commonly 
represented by trees or graphical models such as hidden Markov 
trees (HMTs) |9|. HMTs provide superior denoising results com- 
pared to independent coefficient- wise thresholding/shrinkage meth- 
ods, like the lasso fl5l . Fast exact and/or approximate inference 
algorithms exist in many situations, but not all. In linear inverse 
problems (e.g., deconvolution, tomography, and compressed sens- 
ing) the presence of a sensing/observation matrix can linearly mix 
the Markovian dependency structure so that simple and exact in- 
ference algorithms no longer exist. Past work has dealt with this 
issue by resorting to greedy or suboptimal iterative reconstruction 
methods such as those based on belief propagation [10], iterative re- 
weighting 1 7 ], or variants of the Orthogonal Matching Pursuit (3]|2) 
(see also [13,8]). In this paper, we propose a new modeling approach 
based on group- sparsity penalties that leads to convex optimizations 
that can be solved exactly and efficiently. Our results show that the 
approach performs much better in deblurring and compressed sens- 
ing applications, while being as computationally efficient as stan- 
dard coefficient-wise approaches. Our work uses the group lasso 
with overlap formulation introduced in [4 ], which we further modify 
to better represent dependencies among wavelet coefficients. Note 
here that we use the 11—12 norm formulation. Similar work could 
be performed by using the £1 — £oo norm (TJ. 

We motivate our problem in section [2] In section [3] we explain 
how we model the wavelet transform coefficients into overlapping 
groups. Section [4] outlines the experiments we performed, and the 
results obtained. We conclude the paper in section [5] 
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2. PROBLEM FORMULATION 

Consider a linear observation model which can represent blurring, 
tomographic projection, or compressed sensing: 

y — Lx + w , 

where y is the measured data, L is a linear observation operator, x 
is the image to be reconstructed, and w is additive Gaussian noise. 
Throughout the paper we will assume a standard matrix- vector rep- 
resentation: the image is represented as a column vector and linear 
operators are represented as matrices. Images typically have approx- 
imately sparse representations in the wavelet domain, and this has 
led to many approaches to image reconstruction that attempt to ex- 
ploit this property (6][l4). For example, the standard £2/^1 or lasso 
\ 15 ] reconstruction problem is written as 

:= axgmm ji||2/-A0||! + A z ||0||i J (1) 

where A := LW, the composition of L and the inverse wavelet 
transform W, denotes a set of wavelet coefficients, and Xi > is 
a regularization parameter that balances the tradeoff between fitting 
to the data y and minimizing the £\ norm of (which serves as a 
surrogate for sparsity). 

The lasso penalty reflects the fact that the wavelet coefficients are 
approximately sparse, but in reality not all patterns of sparsity are 
equally plausible/probable. A commonly observed effect is the per- 
sistence of large (or small) wavelet coefficient across scales due to 
the localized nature of edges. Many models have been proposed 
to represent such patterns, and in particular tree- structured models 
have been among the most successful and widely used (e.g., hidden 
Markov trees[9 |). Tree- structured models admit very efficient esti- 
mation procedures based on pruning or message-passing algorithms 
in denoising applications where A is the identity operator, but when 
A is not identity such simple strategies can no longer be applied. In 
fact, in general the optimization problem resulting from tree mod- 
els is non-convex (unlike equation {T} above) and so exact solutions 
are difficult or almost impossible to obtain. This issue has been ad- 
dressed by resorting to greedy or suboptimal iterative reconstruction 
procedures fT0l[7l[T3l[8ll3l[2l. 

This motivates the main idea and contribution of this paper. While 
tree models can represent the patterns of sparsity (or approximate 
sparsity) in the wavelet coefficients of natural images, they are not 
the only way to capture such effects. We are particularly interested 
in modeling the so-called "parent-child dependency" (5J, which is 
used in reference to the persistence of large/small wavelet coeffi- 
cients across scales. Specifically, if a wavelet coefficient at a certain 
spatial location and scale is large/small, then its "neighboring" coef- 
ficients at roughly the same location but finer or coarser scales tend 
to be large/small. The term parent-child refers to a pair of coeffi- 
cients at a certain location and adjacent scales. Our goal is to exploit 



the fact that the coefficients in each pair are typically both large or 
both small (or zero) in magnitude. This can be accomplished using 
an overlapping-group penalty function (5] [12] HD that generalizes 
the lasso in a way that captures parent-child dependencies while at 
the same time retaining its convex nature. 

Fig. [T] depicts wavelet quadtree structures and example of parent- 
child groups. Each (non-leaf) coefficient has an associated orienta- 
tion (horizontal, vertical or diagonal) and four child coefficients of 
the same orientation at the finer scale below it. Many options exist 
for grouping parents with children, and two options are depicted in 
the figure. Many other grouping schemes are also possible in our 
framework (e.g., including "grandparent" coefficients as well), but 
we will not explore such extensions in this paper. 




(b) 



Fig. 1. Quadtree corresponding to the 2-d DWT. At each scale, par- 
ent coefficients can be grouped with child coefficients. All four chil- 
dren may be grouped together with the parent (above) or the parent 
can be grouped with each child individually (below). 

The main contributions of the paper are threefold: 1) We introduce 
a new approach to representing the wavelet coefficient sparsity pat- 
terns commonly observed in natural images; 2) We adapt and extend 
recently proposed methods for group lasso with overlaps in order to 
take advantage of these sparsity patterns in linear inverse problems 
using simple convex optimization techniques; 3) We demonstrate 
fast and efficient reconstruction in image deblurring and compressed 
sensing and significant improvements in reconstruction error relative 
to the lasso. 



3. CONVEX GROUP REGULARIZERS 

To encourage solutions that have wavelet coefficient sparsity pat- 
terns reflective of the parent-child group structure and persistence of 
large/small coefficients across scales we apply the group lasso regu- 
larization 0: 

^:=argmiii|i||y-^||i + A 9 g||^|| 2 J , (2) 

where Q denotes the collection of all parent-child coefficient groups 
and g denotes one such group. The quantity g is the subvector of 



of the coefficients in group g. The group lasso penalty enforces 
group sparsity by setting whole groups to be zero if the £2 norm of 
the group is small relative to their importance in the data-fitting term. 
In most applications of the group lasso, the groups are assumed to be 
disjoint, but in our case they are overlapping and hence the penalty 
terms are coupled. Because of this coupling the standard group lasso 
optimization strategies (e.g., tl4\ ) cannot be directly applied. We 
offer two approaches to deal with this issue. 

3.1. Variable Replication Approach 

One way to deal with overlapping groups is to introduce replicates 
of each coefficient so each group involving a certain coefficient has 
its own "copy" of it. This "decouples" the overlapping groups from 
each other. This approach was proposed and analyzed in |4|. The 
replication of variables (and subsequently the columns of the matrix 
A) results in a formulation that can be expressed as 

O^oglr •— argmin < - \\y - A6\\l + A rep ^ \\0g\U > , (3) 
6 { ~geg ) 

where is the extended vector with replicates and A is a matrix ob- 
tained by replicating the corresponding columns of A. Because the 
penalty function is now separable, computationally efficient iterative 
shrinkage/thresholding methods can be applied 1 14 1. 

3.2. Constraint-Based Approach 

The overlap group lasso with the replication strategy treats each 
group independently of each other. This means that, we can have 
a grandparent - parent coefficient group selected, but not the parent 
- child group, violating the persistence of wavelet transforms across 
multiple scales |9 |. This motivates the use of a penalty that tends to 
cause all the coefficients in a location across scales to have a similar 
value. To this end, we modify equation {3} as 

Oogl := argminj §||y - A6\\ 2 2 + \ og i Y,~ ge g \\Qg\U- 

+ ^ 2 ELiE jeJl (^-^' ) ) 2 } (4) 

where Oi is the "master copy" of the i-th coefficient and 0\^ denote 
the copies of it that appear in the group penalty terms. Ji is the set of 
replicated variables of Oi. Setting r > large forces the replicated 
copies to agree, yielding a solution to the group lasso in ([5]). This en- 
courages a stronger degree of persistence across scales. To the best 
of our knowledge, this approach has not been previously proposed 
for the overlapping group lasso problem. Note that the additional 
quadratic penalty can be combined with the quadratic data-fitting 
term to obtain a quadratic plus separable group sparsity penalty, and 
we can then directly apply standard solvers such as 1 14 1. Henceforth 
we use L to denote lasso, OGL to denote the Overlap Group Lasso 
(corresponding to ([2} and equation ([4} with r ^> 1) and OGLR to de- 
note the overlap group lasso with replication (equation (3}). 

3.3. Sparsity Patterns and Penalties 

To demonstrate the effect of the group penalties in contrast to the 
usual £1 penalty, we compare their values for the 'cameraman' 



image. Fig. [2] shows the wavelet coefficients for the image in the 
standard organization (left) and in a randomized organization (right). 
The i\ norm is invariant to the randomization, but the group penal- 
ties do change because parent-child dependencies are not preserved. 
To quantify the degree to which the group penalties encourage 
parent-child persistence, we compute the ratio of the group penalties 
for the left image to the randomly organized image on the right. The 
group penalties are larger when the parent-child relationships are lost 
due to randomization, and so the ratios are less than 1 for the group 
penalties. The OGL penalty ratio is smaller than the OGLR (with 
replication) penalty, indicating that the OGL penalty in {5} more 
strongly favors the structure in this image compared to the OGLR 
penalty <[3j or the i\ penalty in the lasso. Next we show experimental 
evidence that OGL produces better reconstructions. 





(a) lasso reconstruction (b) OGLR reconstruction 
(MSE=0.0043) (MSE=0.0031) 




(a) Haar DWT of the Cam- 
eraman 



(b) Scrambled DWT 



(c) lasso deblurring (d) OGLR deblurring 
(MSE=0.010) (MSE=0.007) 

Fig. 4. Performance on the cameraman image 



Fig. 2. Ratio of sparsity penalty of (a) to (b): lasso 1.00; group 
lasso 0.70; group lasso with replication 0.85. The group penalties 
significantly favor the structured sparsity pattern of (a) 



4. EXPERIMENTS AND RESULTS 



We evaluate the proposed approaches on 1 -dimensional signals, toy 
images (shown in Fig. [3j, and a real image. We used SpaRSA (T4), 
modified to suit the overlapping groups scenario and the associated 
modified case, to solve equations {3} and ([4}. We used the Haar 
wavelet basis as the sparsity inducing transform. Groups were de- 
fined according to the methods explained in section [2] 



Fig. 3. Sample of toy images used for testing 

To illustrate the potential of the proposed methods we first consider 
compressed sensing and deconvolution results for the cameraman 
image, resized to 128 x 128. Fig. |4(a)| and Fig. |4(b)| show the 
compressed sensing results. The image was undersampled using a 
random iid gaussian matrix, using only 800 samples for every 64 x 
64 subimage. Fig. |4(c)| and Fig. |4(d)| show the results of deblurring 
the image, blurred with a gaussian kernel of variance 1. The samples 
in both cases were corrupted by WGN of variance 1. 



reconstruction error in the presence of noise 




noise variance 



Fig. 5. Effect of varying the noise variance. Groups are formed 
according to Fig. |l(a)| 



4.1. Varying the noise 



To evaluate the improvements of the group penalties we first con- 
sider performance relative to the signal-to-noise ratio. Our first set of 
experiments involved testing the recovery scheme in a compressed 
sensing framework (using an iid gaussian measurement matrix of 
size 800 x 4096). The noise variance was varied from to 1 in steps 
of 0.1, and we measured the mean reconstruction error | 0* — 9\ \. 
Results obtained using the toy images are plotted in Fig.[5 and Fig. 6] 
The results are averaged over 100 independent trials at each noise 
level. A random 'toy' image similar to those in Fig. [3] was generated 
for each trial. We employed a grid search to pick the best value of 
A and r (wherever applicable). Note that OGL and OGLR produces 
better results than standard lasso. 



reconstruction error in the presence of noise 




models for coefficients do not lead to convex optimizations. The 
non-separable nature of the group lasso (when groups are overlap- 
ping) was addressed by devising a new optimization criterion that 
can be solved by standard methods based on separable penalties. 
The experiments demonstrate the performance gains of the group 
penalty methods compared to the lasso. 
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Fig. 6. Effect of varying the noise variance. Groups are formed 
according to Fig. |l(b)| 
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Fig. 7. Effect of varying the number of measurements taken. Note 
how the overlap lasso needs far fewer measurements than the lasso 
to achieve low errors. As the number of measurements is increased 
beyond 300, both methods stop improving significantly. 



4.2. Varying the number of measurements 

The group penalties also reduce the number of compressed sensing 
measurements needed to reconstruct images. To study this effect, we 
varied the number of rows of the matrix. The inputs used were ran- 
dom piecewise constant signals of length 1024, with at most 5 jumps 
assigned uniformly at random (the 1-D equivalent of Fig. [3]), and the 
associated groups were determined using the binary tree structure of 
the 1-D DWT, with a group corresponding to a single parent-child 
pair (an edge of the tree as in Fig. |l(b)| >. It was observed that we 
need far fewer measurements for robust recovery of signals when 
this implicit group structure is assumed, as opposed to that needed 
for the conventional lasso (see Fig. [TJ. 



5. CONCLUSIONS 

The proposed group penalties match the sparsity patterns of wavelet 
coefficients in natural images better than simple coordinate-wise 
penalties such as the £i (lasso) penalty. Like the £i penalty, linear 
inverse problems with group penalties are convex optimizations 
that can be solved efficiently and exactly. Traditional Markov tree 
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